AITopics | learning rate scheduler

Collaborating Authors

learning rate scheduler

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix: On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them

Neural Information Processing SystemsApr-24-2026, 06:50:14 GMT

Suppose we have a non-zero solution θ which is a stationary point of f(θ,t) at t-th step and SGD finds θt = θ at t-th step. Theorem 2.2 of Shapiro and Wardi [9] told us that the learning rate should be small enough for convergence. Obviously, we have η < in practice. As ηt = ηt+1 does not hold, SGD cannot converging to any non-zero stationary point. The proof is now complete.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Gradient Informed Proximal Policy Optimization

Neural Information Processing SystemsFeb-8-2026, 13:35:45 GMT

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Textless NLP -- Zero Resource Challenge with Low Resource Compute

Ramadass, Krithiga, Singh, Abrit Pal, J, Srihari, Kalyani, Sheetal

arXiv.org Artificial IntelligenceSep-24-2024

Coding (VQ-CPC) [8] as the encoder in our speech processing The availability of text data for low-resource languages has pipeline. The input audio files are preprocessed and always been a challenge and transfer learning from multilingual extracted as log-Mel spectrograms. The initial processing models has its own limitations. End-to-End spoken systems involves convolution and normalization layers to extract highlevel without involving text have received significant attention features. These features are then passed through an in the recent years. The Zero-Resource challenge (ZRC) [1] auto-regressive network, which predicts future representations has enabled addressing the low-resource language representation of the input based on past information. One of the key problem and has been a significant driver in this area. In characteristics of VQ-CPC is its use of vector quantization as the acoustic unit discovery task for ZRC, high-dimensional a bottleneck to discretize the continuous embeddings extracted input speech data is mapped to its latent representation to by the autoregressive network into a finite set of discrete codes.

architecture, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.19015

Country: Asia > India > Tamil Nadu > Chennai (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Probabilistic learning rate scheduler with provable convergence

Devapriya, Dahlia, Tholeti, Thulasi, Suresh, Janani, Kalyani, Sheetal

arXiv.org Artificial IntelligenceJul-10-2024

Learning rate schedulers have shown great success in speeding up the convergence of learning algorithms in practice. However, their convergence to a minimum has not been proven theoretically. This difficulty mainly arises from the fact that, while traditional convergence analysis prescribes to monotonically decreasing (or constant) learning rates, schedulers opt for rates that often increase and decrease through the training epochs. In this work, we aim to bridge the gap by proposing a probabilistic learning rate scheduler (PLRS), that does not conform to the monotonically decreasing condition, with provable convergence guarantees. In addition to providing detailed convergence proofs, we also show experimental results where the proposed PLRS performs competitively as other state-of-the-art learning rate schedulers across a variety of datasets and architectures.

log 1, rate scheduler, scheduler, (15 more...)

arXiv.org Artificial Intelligence

2407.07613

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Automatic gradient descent with generalized Newton's method

Bu, Zhiqi, Xu, Shiyun

arXiv.org Artificial IntelligenceJul-2-2024

We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, out method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers. Code to be released at \url{https://github.com/ShiyunXu/AutoGeN}.

arxiv preprint arxiv, optimizer, scheduler, (15 more...)

arXiv.org Artificial Intelligence

2407.02772

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Pennsylvania (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Add feedback

Cyclical Log Annealing as a Learning Rate Scheduler

Naveen, Philip

arXiv.org Artificial IntelligenceMar-13-2024

A learning rate scheduler is a predefined set of instructions for varying search stepsizes during model training processes. This paper introduces a new logarithmic method using harsh restarting of step sizes through stochastic gradient descent. Cyclical log annealing implements the restart pattern more aggressively to maybe allow the usage of more greedy algorithms on the online convex optimization framework. The algorithm was tested on the CIFAR-10 image datasets, and seemed to perform analogously with cosine annealing on large transformer-enhanced residual neural networks. Future experiments would involve testing the scheduler in generative adversarial networks and finding the best parameters for the scheduler with more experiments.

experiment, scheduler, semanticscholar, (10 more...)

arXiv.org Artificial Intelligence

2403.14685

Country: North America > United States > Virginia > Albemarle County > Charlottesville (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Lu, Haoyu, Liu, Wen, Zhang, Bo, Wang, Bingxuan, Dong, Kai, Liu, Bo, Sun, Jingxiang, Ren, Tongzheng, Li, Zhuoshu, Yang, Hao, Sun, Yaofeng, Deng, Chengqi, Xu, Hanwei, Xie, Zhenda, Ruan, Chong

arXiv.org Artificial IntelligenceMar-11-2024

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: Data Construction: We strive to ensure our data is diverse, scalable and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content (expert knowledge, textbooks), aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction-tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Model Architecture: Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024) within a fixed token budget, while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. Training Strategy: We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. Starting with a focus on text, we gradually adjust the ratio to facilitate a balanced integration of both modalities.

arxiv preprint arxiv, dataset, deepseek-vl, (15 more...)

arXiv.org Artificial Intelligence

2403.05525

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Hawaii (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gradient Informed Proximal Policy Optimization

Son, Sanghyun, Zheng, Laura Yu, Sullivan, Ryan, Qiao, Yi-Ling, Lin, Ming C.

arXiv.org Artificial IntelligenceDec-14-2023

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.

analytical gradient, gradient, rp gradient, (13 more...)

arXiv.org Artificial Intelligence

2312.0871

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Visual Guide to Learning Rate Schedulers in PyTorch

#artificialintelligenceDec-6-2022, 17:25:10 GMT

Neural networks have many hyperparameters that affect the model's performance. One of the essential hyperparameters is the learning rate (LR), which determines how much the model weights change between training steps. In the simplest case, the LR value is a fixed value between 0 and 1. However, choosing the correct LR value can be challenging. On the one hand, a large learning rate can help the algorithm to converge quickly.

artificial intelligence, learning rate scheduler, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback